Semi-Supervised Answer Extraction from Discussion Forums
نویسندگان
چکیده
Mining online discussions to extract answers is an important research problem. Methods proposed in the past used supervised classifiers trained on labeled data. But, collecting training data for each target forum is labor intensive and time consuming, thus limiting their deployment. A recent approach had proposed to extract answers in an unsupervised manner, by taking cues from their repetitions. This assumption however, does not hold true in many cases. In this paper, we propose two semi-supervised methods for extracting answers from discussions, which utilize the large amount of unlabeled data available, alongside a very small training set to obtain improved accuracies. We show that it is possible to boost the performance by introducing a related, but parallel task of identifying acknowledgments to the answers. The accuracy achieved by our approaches surpass the baselines by a wide margin, as shown by our experiments.
منابع مشابه
Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotat...
متن کاملSemi-supervised and Unsupervised Methods for Categorizing Posts in Web Discussion Forums
Semi-supervised and unsupervised methods for categorizing posts in web discussion forums Krish Perumal Master of Science Graduate Department of Computer Science University of Toronto 2016 Web discussion forums are used by millions of people worldwide to share information belonging to a variety of domains such as automotive vehicles, pets, sports, etc. They typically contain posts that fall into...
متن کاملSemi-supervised and unsupervised categorization of posts in Web discussion forums using part-of-speech information and minimal features
Web discussion forums typically contain posts that fall into different categories such as question, solution, feedback, spam, etc. Automatic identification of these categories can aid information retrieval that is tailored for specific user requirements. Previously, a number of supervised methods have attempted to solve this problem; however, these depend on the availability of abundant trainin...
متن کاملReusing Discussion Forums as Learning Resources in Wbt Systems
Discussion forums are highly popular and widely applied tools in Web-based training (WBT) systems. Usually, discussion forums are places where learners discuss topics related to courseware, their current learning task, or the learning project they are working on. These discussion forums contain tremendous educational potential for future learners, since they contain question and answer dialogue...
متن کاملL'analyse de l'émotion dans les forums de santé (Analysis of Emotion in Health Fora) [in French]
Analysis of Emotion in Health Fora Studies about emotion in fora are numerous in Linguistics and Psychology. This contribution approaches this subject from an Information and Communication Sciences point of view, and studies emotion as a criteron of pertinence for patients in a health forum. This paper introduces the empirical step of automatic language processing in order to answer this questi...
متن کامل